Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Tess Vu

Published

October 14, 2025

PART I: HEALTHCARE ACCESS FOR VULNERABLE POPULATIONS

A. Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Step 1: Data Collection

  • How many hospitals are in the dataset?

    • 223 hospitals.
  • How many census tracts?

    • 3,445 census tracts.
  • What coordinate reference system is each dataset in?

    • PA county boundaries CRS: GCS WGS 84 (Pseudo-Mercator) / EPSG

    • PA census tracts CRS: PCS NAD 83 / EPSG 4269.

    • PA hospitals CRS: GCS WGS 84 (Pseudo-Mercator) / EPSG 4326.

Step 2: Get Demographic Data

Code
# Get age demographic data from ACS.
tract_ages <- get_acs(
  geography = "tract",
  variables = c(
    "B01001_020E", "B01001_021E", "B01001_022E",
    "B01001_023E", "B01001_024E", "B01001_025E",
    "B01001_044E", "B01001_045E", "B01001_046E",
    "B01001_047E", "B01001_048E", "B01001_049E"
  ),
  state = "PA",
  year = 2020,
  output = "wide"
)
 # Get population and income data from ACS.
tract_demographics <- get_acs(
  geography = "tract",
  variables = c(
    total_pop = "B01003_001",
    median_income = "B19013_001"
  ),
  state = "PA",
  year = 2020,
  output = "wide"
)

# Combine 65+ age estimates.
tract_demographics$age_65_overE <-
  tract_ages$B01001_020E + tract_ages$B01001_021E + tract_ages$B01001_022E +
  tract_ages$B01001_023E + tract_ages$B01001_024E + tract_ages$B01001_025E +
  tract_ages$B01001_044E + tract_ages$B01001_045E + tract_ages$B01001_046E +
  tract_ages$B01001_047E + tract_ages$B01001_048E + tract_ages$B01001_049E

# Combine 65+ age MOEs.
tract_demographics$age_65_overM <-
  tract_ages$B01001_020M + tract_ages$B01001_021M + tract_ages$B01001_022M +
  tract_ages$B01001_023M + tract_ages$B01001_024M + tract_ages$B01001_025M +
  tract_ages$B01001_044M + tract_ages$B01001_045M + tract_ages$B01001_046M +
  tract_ages$B01001_047M + tract_ages$B01001_048M + tract_ages$B01001_049M

# Remove unneeded column.
tract_demographics$TRACT <- NULL

# Rename tract column.
tract_demographics <- rename(tract_demographics, TRACT = NAME)

# Join to tract boundaries
pa_tracts <- left_join(pa_tracts, tract_demographics, by = "GEOID")

# Summary statistics for reference.
#summary(pa_tracts[c("total_popE", "total_popM",
#                    "median_incomeE", "median_incomeM",
#                    "age_65_overE", "age_65_overM")])
  • What year of ACS data is being used?

    • 2020 ACS data.
  • How many tracts have missing income data?

    • 64 tracts are missing income estimates and 68 tracts are missing income MOEs.
  • What is the median income across all PA census tracts?

    • $61,691.

Step 3: Define Vulnerable Populations

Code
# Filter for vulnerable tracts.
# Normalize elderly population.
pa_tracts$pct_elderly <- (pa_tracts$age_65_overE / pa_tracts$total_popE) * 100

# Get 3rd quantile of percent elderly.
elderly_q3 <- quantile(pa_tracts$pct_elderly, probs = 0.75, na.rm = TRUE)
# Get 1st quantile of income estimates.
income_q1 <- quantile(pa_tracts$median_incomeE, probs = 0.25, na.rm = TRUE)

# Determine criteria for vulnerable tracts.
pa_tracts <- pa_tracts %>%
  mutate(
    # Low income for individuals equal or below the 1st quartile.
    low_income = case_when(median_incomeE <= income_q1 ~ TRUE,
                           median_incomeE > income_q1 ~ FALSE),
    # High elderly percent when tracts are equal or above the 3rd quartile.
    high_elderly = case_when(pct_elderly >= elderly_q3 ~ TRUE,
                             pct_elderly < elderly_q3 ~ FALSE),
    # Considered vulnerable when both of the above cases are true.
    vulnerable = case_when(low_income == TRUE & high_elderly == TRUE ~ TRUE,
                           low_income == FALSE & high_elderly == FALSE ~ FALSE,
                           low_income == TRUE & high_elderly == FALSE ~ FALSE,
                           low_income == FALSE & high_elderly == TRUE ~ FALSE,)
  )

# Create a summary for personal reference on vulnerable statistics.
vulnerable_summary <- pa_tracts %>%
  group_by(vulnerable) %>%
  summarize(
    "Number of Tracts" = n(),
    "Median Income" = median(median_incomeE, na.rm = TRUE),
    "Percent Elderly" = median(pct_elderly, na.rm = TRUE),
    "Percent Vulnerable" = (n() / 3445) * 100
  )

# Remove geometry column for kable table.
vulnerable_summary$geometry <- NULL

# Create professional table.
kable(
  vulnerable_summary,
  col.names = c("Vulnerable", "Number of Tracts", "Median Income", "Percent Elderly", "Percent Tracts"),
  digit = 2,
  caption = "<b>PENNSYLVANIA: VULNERABLE TRACTS</b>",
  format.args = list(big.mark = ",")
) %>%
  kable_styling(latex_options = "striped") %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, color = "white", background = "black")
PENNSYLVANIA: VULNERABLE TRACTS
Vulnerable Number of Tracts Median Income Percent Elderly Percent Tracts
FALSE 3,209 63,322.0 17.88 93.15
TRUE 172 40,340.5 26.26 4.99
NA 64 NA 3.27 1.86
  • What income threshold was chosen and why?

    • income threshold I chose was for the amount equal to or below 25%, or first quartile of the median income data. I believe it’s pertinent because many other factors come into play with income and affordability, especially when it comes to older populations, who have different forms of income that contribute to their day-to-day livelihoods. In addition, many elderly may have significant debts like those related to medical expenses that put them in more financial risk, so the 25% is a broader net that’s an attempt to capture the idiosyncracies of affordability and economic struggles in the US.
  • What elderly population threshold was chosen and why?

    • I used the third quantile for the elderly population percentage in PA, this means I’m targetting tracts with a larger proportion of older individuals that paint a picture of an aging populace in the areas. With an aging population, and younger populations generally moving to other regions with economic opportunity, elderly tracts’ public services may take a hit with out-migrating young workers and be more susceptible to physical and other health-related vulnerabilities.
  • How many tracts meet the vulnerability criteria?

    • 172 tracts meet the vulnerability criteria.
  • What percentage of PA census tracts are considered vulnerable by your definition?

    • Approximately 4.99% of PA’s census tracts are considered vulnerable.

Step 4: Calculate Distance to Hospitals

Code
# Filter PA tracts by vulnerability.
vulnerable_tracts <- pa_tracts %>%
  filter(vulnerable == TRUE)

# Transform to appropriate projected CRS.
vulnerable_tracts <- st_transform(vulnerable_tracts, crs = 5070)
pa_tracts <- st_transform(pa_tracts, crs = 5070)
pa_hospitals <- st_as_sf(pa_hospitals,
                         coords = c("longitude", "latitude"),
                         crs = 4326)
pa_hospitals <- st_transform(pa_hospitals, 5070)

# Calculate distance from each tract centroid to nearest hospital.
# Create vulnerable tract centroids.
vulnerable_centroids <- st_centroid(vulnerable_tracts)

# Create a list of indexes of hospitals closest to vulnerable centroids
nearest_hospital_index <- st_nearest_feature(vulnerable_centroids, pa_hospitals)

# Create points from the hospital index.
nearest_hospital_point <- pa_hospitals[nearest_hospital_index,]

# Calculate distance between centroid and nearest hospital.
hospital_distance <- st_distance(vulnerable_centroids, nearest_hospital_point, by_element = TRUE)

# Set miles units.
vulnerable_tracts$nearest_hospital_mi <- as.numeric(set_units(hospital_distance, "mi"))

# Summary statistics for reference.
#summary(vulnerable_tracts[c("total_popE", "total_popM",
#                            "median_incomeE", "median_incomeM",
#                            "age_65_overE", "age_65_overM",
#                            "pct_elderly", "low_income",
#                            "high_elderly", "vulnerable")])

# Determine how many tracts have hospitals more than 15 miles away.
underserved_number <- sum(vulnerable_tracts$nearest_hospital_mi > 15)

print(paste0("There are ", underserved_number, " census tracts that are underserved."))
[1] "There are 11 census tracts that are underserved."

I chose Albers Equal Area because it was appropriate when calculating hospital distances all over the state, if I’d chosen a PCS that was more localized like State Planes, then depending whichever cardinal direction I selected (i.e. North or South Pennsylvania), it’d have some distortion in other regions of Pennsylvania.

  • What is the average distance to the nearest hospital for vulnerable tracts?

    • 4.78 miles.
  • What is the maximum distance?

    • 27.76 miles.
  • How many vulnerable tracts are more than 15 miles from the nearest hospital?

    • 11 census tracts of the 172 total vulnerable tracts are more than 15 miles from the nearest hospital.

Step 5: Identify Underserved Areas

Code
# Filter tracts with hospitals more than 15 miles away.
underserved_tracts <- vulnerable_tracts %>%
  filter(nearest_hospital_mi > 15)

# Create underserved column when hospitals are more than 15 miles away.
vulnerable_tracts <- vulnerable_tracts %>% mutate(
  underserved = case_when(
    vulnerable_tracts$nearest_hospital_mi > 15 ~ TRUE,
    TRUE ~ FALSE))

# Calculate percent of underserved tracts by dividing it with vulnerable tracts.
underserved_percent <- (nrow(underserved_tracts) / nrow(vulnerable_tracts)) * 100

print(sprintf("%.2f%% of vulnerable census tracts are underserved.", underserved_percent))
[1] "6.40% of vulnerable census tracts are underserved."
  • How many tracts are underserved?

    • 11 census tracts are underserved.
  • What percentage of vulnerable tracts are underserved?

    • 6.4% of vulnerable census tracts are underserved.
  • Is this surprising? Why or why not?

    • Not quite. I’m from Utah and the disparity with hospital access is expected, especially since it’s twice the size of Pennsylvania. I know in the US’ East Coast I expect that the states have better hospital coverage given that they were also established as states over a century earlier than Utah, so they’ve had that time to construct more expansive infrastructure compared. That being said, it’s still important to address the lacking access of those who are underserved, even if it seems like a small percent of the population.

Step 6: Aggregate to County Level

Code
# Spatial join tracts to counties.
# Remove the " County" substring from the NAMELSADCO column values.
vulnerable_tracts <- vulnerable_tracts %>%
  mutate(NAMELSADCO = str_remove_all(NAMELSADCO, " County"))

# Change the values in NAMELSADCO to all capitalized for later joining.
vulnerable_tracts$NAMELSADCO <- toupper(vulnerable_tracts$NAMELSADCO)

# Align CRS of both data before spatial join.
vulnerable_tracts <- st_transform(vulnerable_tracts, crs = 4326)
pa_counties <- st_transform(pa_counties, crs = 4326)

# Spatial left join.
county_tracts <- st_join(vulnerable_tracts, pa_counties, left = TRUE)

# Aggregate statistics by county
aggregate_summary <- county_tracts %>%
  st_drop_geometry() %>% # Drop the geometry.
  group_by(NAMELSADCO) %>% # Group by county.
  summarize(
    # Summarize the number of vulnerable and underserved people.
    number_vulnerable = sum(vulnerable),
    number_underserved = sum(underserved),
    # Summarize the percent of underserved people.
    percent_underserved = (number_underserved / number_vulnerable) * 100,
    # Summarize the average distance to the nearest hospital.
    avg_nearby_hospital_dist = mean(nearest_hospital_mi, na.rm = TRUE),
    # Summarize the total estimated population.
    total_vulnerable_pop = sum(total_popE)
  ) %>%
  # Rename the NAMELSADCO column to county.
  rename(
    county = NAMELSADCO
  )

# Change COUNTY_NAM to county.
pa_counties <- rename(pa_counties, county = COUNTY_NAM)

# Conduct a left join.
pa_mapping <- left_join(pa_counties, aggregate_summary, by = "county")
Code
# Plot map of percent underserved by Pennsylvania counties.
ggplot(pa_mapping) +
  geom_sf(aes(fill = percent_underserved), color = "white", size = 0.5) +
  scale_fill_viridis_c(
    name = "Percent Underserved",
    option = "magma",
    # Format to show percent sign on color bar.
    labels = function(x) paste0(x, "%")
  ) +
  labs(
    title = "Most Underserved Counties",
    subtitle = "Pennsylvania, United States",
    caption = "Source: ACS 2019–2023"
  ) +
  theme_void()

  • Which 5 counties have the highest percentage of underserved vulnerable tracts?

    • Sullivan, Cameron, Forest, Clearfield, and Juniata at 100% for the first three counties then 82.22% and 71.43% for the latter two counties.
  • Which counties have the most vulnerable people living far from hospitals?

    • Counties with an average hospital distance of more than 15 miles and with the most vulnerable people in that group are Bradford, Potter, Sullivan, Cameron, and Juniata that have average hospital distances more than 15 miles. Bradford consists of 1 census tract and has the least number of vulnerable people at 2,833 and Cameron consists of 7 census tracts and has the most at 17,052 vulnerable people. Other counties fall between the two ranges, but below 10,000 for the vulnerable population, so Cameron is a significant outlier.
  • Are there any patterns in where underserved counties are located?

    • It looks like more of the underserved counties are in the northern portion of Pennsylvania, and seemingly clustered toward the middle in addition. I also know those areas are peppered with several forested regions like national and state forests. Development may be much more restricted due to that. There are also underserved counties at the fringes of Pennsylvania, and several counties noted as NA due to the fact that the census tracts within them were part of the 64 NA tracts.

Step 7: Create Summary Table

Code
# Create and format priority counties table.
kable_table <- pa_mapping[, c("county", "number_vulnerable",
                                "number_underserved", "percent_underserved",
                                "avg_nearby_hospital_dist", "total_vulnerable_pop")]

# Drop geometry and rename the columns.
kable_table <- kable_table %>%
  st_drop_geometry() %>%
  rename(
    "County" = county,
    "Number of Vulnerable Tracts" = number_vulnerable,
    "Number of Underserved Tracts" = number_underserved,
    "Percent Underserved" = percent_underserved,
    "Average Mile(s) to Nearest Hospital" = avg_nearby_hospital_dist,
    "Total Vulnerable People" = total_vulnerable_pop
  )

# Filter and arrage the top 10 underserved counties by percent.
kable_table <- kable_table %>%
  arrange(desc(`Percent Underserved`)) %>%
  slice(1:10)

# Format percent column to have % sign.
kable_table <- kable_table %>%
  mutate(
    # Format so percent and distance values have units.
    `Percent Underserved` = paste0(round(`Percent Underserved`, 2), "%"),
    `Average Mile(s) to Nearest Hospital` = paste0(round(`Average Mile(s) to Nearest Hospital`, 2), " mi")
  )

# Create kable.
kable(
  kable_table,
  # Center align.
  align = "c",
  # Limit percents to two decimal places.
  digit = 2,
  # Make sure numbers are separated by thousands places.
  format.args = list(big.mark = ","),
  caption = "<b>TOP 10 UNDERSERVED PA COUNTIES: For Priority Healthcare Investment</b>"
) %>%
  # Shade every other row for ease of reading.
  kable_styling(bootstrap_options = "striped") %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, color = "mistyrose", background = "red3")
TOP 10 UNDERSERVED PA COUNTIES: For Priority Healthcare Investment
County Number of Vulnerable Tracts Number of Underserved Tracts Percent Underserved Average Mile(s) to Nearest Hospital Total Vulnerable People
SULLIVAN 6 6 100% 20.5 mi 8,116
CAMERON 6 6 100% 19.31 mi 15,048
FOREST 4 4 100% 18.04 mi 10,108
CLEARFIELD 6 5 83.33% 15.76 mi 15,831
JUNIATA 7 5 71.43% 14.06 mi 17,193
POTTER 5 3 60% 14.61 mi 11,862
BRADFORD 2 1 50% 9.36 mi 8,715
CRAWFORD 2 1 50% 15.37 mi 4,018
BUCKS 2 0 0% 4.02 mi 13,326
TIOGA 3 0 0% 5.88 mi 10,515

PART II: COMPREHENSIVE VISUALIZATION

Map 1: County-Level Choropleth

Code
# Create county-level access map.
# Align CRS.
pa_hospitals <- st_transform(pa_hospitals, crs = 4326)
pa_mapping <- st_transform(pa_mapping, crs = 4326)

# Choropleth map of PA counties.
choropleth_pa <- ggplot(pa_mapping) +
  geom_sf(
    # Color by underserved percentage.
    aes(fill = percent_underserved),
    color = "white",
    linewidth = 0.25
  ) +
  scale_fill_viridis_c(
    # Color bar customization.
    name = "Percent Underserved",
    option = "viridis",
    # Format to show percent sign in color bar.
    labels = function(x) paste0(x, "%")
  ) +
  geom_point(
    data = pa_hospitals,
    aes(x = LONGITUDE, y = LATITUDE, color = "Hospitals"),
    alpha = 0.75, inherit.aes = FALSE
  ) +
  scale_color_manual(
    name = "Hospitals",
    values = c("Hospitals" = "red")
  ) +
  labs(
    title = "UNDERSERVED COUNTIES & HOSPITAL LOCATIONS",
    subtitle = "Pennsylvania, United States",
    caption = "Source: 5-Year American Community Survey (ACS) 2019–2023",
  ) +
  theme_void()

choropleth_pa + theme(
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(family = "mono"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Map 2: Detailed Vulnerability Map

Code
# Create detailed tract-level map.
# Align CRS.
vulnerable_tracts <- st_transform(vulnerable_tracts, crs = 4326)
pa_tracts <- st_transform(pa_tracts, crs = 4326)

# Filter by underserved tracts.
vulnerable_tracts <- vulnerable_tracts %>%
  rename("Underserved Tracts" = underserved)

# Choropleth map of PA census tracts colored by vulnerable and underserved.
choropleth_map <- ggplot(vulnerable_tracts) +
  geom_sf(
    aes(fill = `Underserved Tracts`),
  ) +
  geom_sf(
    data = pa_tracts, fill = NA,
    color = "midnightblue", linewidth = 0.25
  ) +
  geom_sf(
    data = pa_counties, fill = NA,
    color = "midnightblue", linewidth = 0.5
  ) +
  geom_point(
    data = pa_hospitals,
    aes(x = LONGITUDE, y = LATITUDE, color = "Hospitals"),
    alpha = 0.75, inherit.aes = FALSE
  ) +
  scale_fill_manual(
    # Custom color bar.
    name = "Tracts",
    values = c("TRUE" = "gold2", "FALSE" = "lemonchiffon2"),
    labels = c("TRUE" = "Underserved", "FALSE" = "Vulnerable"),
    # Reorder so that underserved is placed above vulnerable in the legend.
    breaks = c(TRUE, FALSE)
  ) +
  scale_color_manual(
    # Custom point addition to legend.
    name = "Hospitals",
    values = c("Hospitals" = "red2")
  ) +
  labs(
    title = "UNDERSERVED AND VULNERABLE CENSUS TRACTS",
    subtitle = "With Hospital Locations in Pennsylvania, United States",
    caption = "Source: 5-Year American Community Survey (ACS) 2019–2023",
  ) +
  theme_void()

choropleth_map + theme(
  panel.background = element_rect(fill = "azure3", size = 0.75),
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(family = "mono"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Chart 1: Distribution Analysis

Code
# Create distribution visualization.
# Make histogram of distance to nearest hospital.
histogram_hospital <- ggplot(vulnerable_tracts, aes(x = nearest_hospital_mi)) +
  geom_histogram(
    color = "black",
    fill = "mistyrose2"
  ) +
  labs(
    title = "DISTANCE TO NEAREST HOSPITAL",
    subtitle = "Histogram",
    x = "Distance to Nearest Hospital (mi)",
    y = "Count") +
  theme_minimal() +
  scale_x_continuous(labels = scales::label_number(suffix = " mi")) + # Format so tick numbers have miles at end.
  theme(aspect.ratio = 1/3) # Maintain aspect ratio.

# Make histogram of elderly population.
histogram_elderly <- ggplot(vulnerable_tracts, aes(x = age_65_overE)) +
  geom_histogram(
    color = "black",
    fill = "mistyrose2"
  ) +
  # Thousand separator.
  scale_x_continuous(labels = comma) +
  labs(
    title = "VULNERABLE POPULATION OF AGES 65+",
    subtitle = "Histogram",
    x = "Population of Ages 65+",
    y = "Count") +
  theme_minimal() +
  theme(aspect.ratio = 1/3) # Maintain aspect ratio.

# Create faceted plot of all three charts above in one column and three rows.
faceted_plot <-
  (histogram_hospital + theme(
    # Customize margins and text.
    plot.title = element_text(margin = margin(b = 10, unit = "pt"), face = "bold"),
    axis.title.x = element_text(margin = margin(t = 10, unit = "pt")),
    axis.title.y = element_text(margin = margin(r = 10, unit = "pt"))
  )) /
  (histogram_elderly + theme(
    # Customize margins and text.
    plot.title = element_text(margin = margin(t = 20, b = 10, unit = "pt"), face = "bold"),
    axis.title.x = element_text(margin = margin(t = 10, unit = "pt")),
    axis.title.y = element_text(margin = margin(r = 10, unit = "pt"))
  ))

# Display faceted plot.
faceted_plot

Of the vulnerable tracts, it looks like the distances to the nearest hospitals are generally less than 5 miles and the histogram is skewed right. From this visual distribution, it looks like there’s accessibility with vulnerable populations and drivable distance to the nearest hospital. The elderly population also look like they cluster around the 500 estimate, also skewing to the right significantly once the estimate reachs 1,000 or more.

PART III: BYO DATA ANALYSIS

Resource Allocation for Substance Use Analysis

Code
# CRS checks and transformations.

# Change data CRS to Pennsylvania South (EPSG 3365) for calculations.
sharps_boxes <- sharps_boxes %>%
  st_transform(crs = 3365)
treatment_facilities <- treatment_facilities %>%
  st_transform(crs = 3365)
free_meal_sites <- free_meal_sites %>%
  st_transform(crs = 3365)
philly_county <- philly_county %>%
  st_transform(crs = 3365)
philly_tracts <- philly_tracts %>%
  st_transform(crs = 3365)

crime_2024 <- crime_2024 %>%
  # Remove points where latitude and longitude are NA or empty.
  .[!(is.na(.$lng) | .$lng == "" | is.na(.$lat) | .$lat == ""),] %>%
  # Change to spatial dataframe and set CRS to 4326 because of coordinate data.
  st_as_sf(., coords = c("lng", "lat"), crs = 4326) %>%
  # Change to needed CRS for calculations.
  st_transform(crs = 3365)

# Filter 2024 crime data by substance-related uses.
substances_2024 <- crime_2024 %>%
  filter(text_general_code == "Narcotic / Drug Law Violations")
Code
# Clean data, filter, and remove unneeded columns.

sharps_boxes[, c("OBJECTID", "OBJECTID_1", "SITE_NAME", "SUB_NAME", "STATE")] <- list(NULL)

# Filter to Philadelphia
treatment_facilities <- treatment_facilities %>%
  filter(`COUNTY` == "PHILADELPHIA")
treatment_facilities[, c("OBJECTID", "FACILITY_I", "GEOCODING_", "STREET_2", "CITY", "STATE", "TELEPHONE_")] <- list(NULL)

# Filter to active sites.
free_meal_sites <- free_meal_sites %>%
  filter(`status` == "Active")
free_meal_sites[, c("objectid", "phone_numb", "temporary_", "site_key", "website", "hours_mon_", "hours_tues", "hours_wed_", "hours_thur", "hours_fri_", "hours_sat_", "hours_sa_1", "hours_sa_2", "hours_sun_", "email", "seasonally")] <- list(NULL)

substances_2024[, c("the_geom", "cartodb_id", "the_geom_webmercator", "objectid", "dc_dist", "psa", "dispatch_date_time", "dispatch_date", "dispatch_time", "hour", "dc_key", "ucr_general")] <- list(NULL)

philly_tracts[, c("STATEFP", "COUNTYFP", "GEOIDFQ", "NAME", "NAMELSAD", "STUSPS", "TRACTCE", "STATE_NAME", "LSAD", "TRACT")] <- list(NULL)
colnames(philly_tracts)[2] <- "county"
  • What dataset was chosen and why?

  • What is the data source and date?

    • The crime data is from 2024, free meal sites from end of last year, and facilities and sharps bins data are from end of year 2023. All of these data had no official creation dates on their pages, but the aforementioned years were observed from the public GitHub repository. All data were extracted from OpenDataPhilly’s website.
  • How many features does it contain?

    • The crime data has 160,388 features, free meals site data has 304 features, treatment facilities data has 965 features, and sharps box data has 27 features.
  • What CRS is it in? Was transforming needed?

    • The crime data has no CRS but data having coordinates indicates WGS84, free meals site data is in WGS84 / Pseudo-Mercator (EPSG 3857), treatment facilities is in WGS84 / World Geodetic System 1984 (EPSG 4326), and sharps box data is in WGS84 / Pseudo-Mercator (EPSG 3857). Both the Philadelphia tract and county data are in WGS84 / World Geodetic System 1984 (EPSG 4326). I made sure to transform all data to Pennsylvania South (EPSG 3365) for calculations. For the crime data without the CRS, I had to turn it into a geodataframe and change the CRS to WGS84 first for that function, and then transform the CRS again to Pennsylvania South.
  1. Research Question
  • Do tracts with high substance-related instances have adequate access to treatment facilities, sharps bins, and free meal programs?
  1. Conduct spatial analysis
Code
# Aggregate and count substance use in Philadelphia census tracts.
# Use pipe to input st_intersects return values into lengths() function.
philly_tracts$substance_use <- st_intersects(philly_tracts, substances_2024) |> lengths()

# Normalize substance use data by population total.
philly_tracts <- philly_tracts %>%
  mutate(pct_substance_use = case_when(
    # Calculate as 0 when both values are 0.
    total_popE == 0 & substance_use == 0 ~ 0,
    # Calculate as NA with numeric type rather than NA with logical type to avoid type mismatch.
    total_popE == 0 ~ NA_real_,
    # Otherwise, calculate normally.
    TRUE ~ (substance_use / total_popE) * 100
  ))

# Determine high substance use tracts as above 50% of bell curve.
philly_tracts <- philly_tracts %>%
  mutate("high_substance_use" = case_when(
    .$pct_substance_use > 0.20539 ~ TRUE,
    TRUE ~ FALSE
  ))

# Filter tracts with substance uses above the median.
substance_tracts <- philly_tracts %>%
  filter(.$high_substance_use == TRUE)
Code
# Create census tract centroids for substance_tracts.
substance_centroids <- st_centroid(substance_tracts)

# Calculate distances between centroids and food accessibility.
nearest_meal_index <- st_nearest_feature(substance_centroids, free_meal_sites)

nearest_meal_point <- free_meal_sites[nearest_meal_index,]

meal_distance <- st_distance(substance_centroids, nearest_meal_point, by_element = TRUE)

substance_tracts$nearest_meal_mi <- set_units(meal_distance, "mi")

# Calculate distances between centroids and treatment facilities.
nearest_facility_index <- st_nearest_feature(substance_centroids, treatment_facilities)

nearest_facility_point <- treatment_facilities[nearest_facility_index,]

facility_distance <- st_distance(substance_centroids, nearest_facility_point, by_element = TRUE)

substance_tracts$nearest_facility_mi <- set_units(facility_distance, "mi")

# Calculate distances between centroids and sharps bins.
nearest_sharps_index <- st_nearest_feature(substance_centroids, sharps_boxes)

nearest_sharps_point <- sharps_boxes[nearest_sharps_index,]

sharps_distance <- st_distance(substance_centroids, nearest_sharps_point, by_element = TRUE)

substance_tracts$nearest_sharps_mi <- set_units(sharps_distance, "mi")

# Determine how many tracts' nearest free meal program is more than 0.5 mi. away.
tracts_lacking_meals <- sum(substance_tracts$nearest_meal_mi > set_units(0.5, "mi"))
# Determine how many tracts' nearest treatment facility is more than 0.5 mi. away.
tracts_lacking_facilities <- sum(substance_tracts$nearest_facility_mi > set_units(0.5, "mi"))
# Determine how many tracts' nearest sharps bin is more than 0.5 mi. away.
tracts_lacking_sharps <- sum(substance_tracts$nearest_sharps_mi > set_units(0.5, "mi"))

print(paste0("There are ", tracts_lacking_meals, " census tracts that don't have active free meal programs within 0.5 mi. walking distance."))
[1] "There are 7 census tracts that don't have active free meal programs within 0.5 mi. walking distance."
Code
print(paste0("There are ", tracts_lacking_facilities, " census tracts that don't have substance alcohol treatment facilities within 0.5 mi. walking distance."))
[1] "There are 13 census tracts that don't have substance alcohol treatment facilities within 0.5 mi. walking distance."
Code
print(paste0("There are ", tracts_lacking_sharps, " census tracts that don't have disposable sharps bins within 0.5 mi. walking distance."))
[1] "There are 47 census tracts that don't have disposable sharps bins within 0.5 mi. walking distance."
Code
# Map tracts more than 0.5 mile walking distance from free food programs.
far_from_meals_tracts <- substance_tracts %>%
  mutate(nearest_meal_mi = drop_units(nearest_meal_mi)) %>%
  filter(nearest_meal_mi > 0.5)

# Transform geography data back to WGS84 for web mapping.
far_from_meals_tracts <- far_from_meals_tracts %>%
  st_transform(crs = 4326)

# Choropleth map colored by substance uses with treatment facility points.
far_from_meals_choropleth <- ggplot(far_from_meals_tracts) +
  geom_sf(
    aes(fill = `pct_substance_use`),
  ) +
  geom_sf(
    data = philly_tracts, fill = NA,
    color = "black", linewidth = 0.25,
    alpha = 0.5
  ) +
  geom_point(
    data = free_meal_sites,
    aes(x = x, y = y, color = "Free Meals"),
    alpha = 0.75, inherit.aes = FALSE
  ) +
  scale_fill_viridis_c(
    name = "Percent Substance Events",
    labels = function(x) paste0(x, "%")
  ) +
  scale_color_manual(
    name = "Free Meals",
    values = c("Free Meals" = "red2")
  ) +
  labs(
    title = "HIGH DRUG EVENTS IN PHILADELPHIA, PA",
    subtitle = "More than 0.5 mi from Free Meals",
    caption = "Sources: 5-Year ACS 2023, OpenDataPhilly 2023–2025",
  ) +
  theme_void()

far_from_meals_choropleth + theme(
  panel.background = element_rect(fill = "azure2", size = 0.75),
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(face = "italic"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Code
# Map tracts more than 0.5 mile walking distance from treatment facilities.
far_from_facilities_tracts <- substance_tracts %>%
  mutate(nearest_facility_mi = drop_units(nearest_facility_mi)) %>%
  filter(nearest_facility_mi > 0.5)

# Transform geography data back to WGS84 for web mapping.
far_from_facilities_tracts <- far_from_facilities_tracts %>%
  st_transform(crs = 4326)

# Choropleth map colored by substance uses with treatment facility points.
far_from_facilities_choropleth <- ggplot(far_from_facilities_tracts) +
  geom_sf(
    aes(fill = `pct_substance_use`),
  ) +
  geom_sf(
    data = philly_tracts, fill = NA,
    color = "black", linewidth = 0.25,
    alpha = 0.5
  ) +
  geom_point(
    data = treatment_facilities,
    aes(x = LONGITUDE, y = LATITUDE, color = "Treatment Facilities"),
    alpha = 0.75, inherit.aes = FALSE
  ) +
  scale_fill_viridis_c(
    name = "Percent Substance Events",
    labels = function(x) paste0(x, "%")
  ) +
  scale_color_manual(
    name = "Treatment Facilities",
    values = c("Treatment Facilities" = "red2")
  ) +
  labs(
    title = "HIGH DRUG EVENTS IN PHILADELPHIA, PA",
    subtitle = "More than 0.5 mi from Treatment Facilities",
    caption = "Sources: 5-Year ACS 2023, OpenDataPhilly 2023–2025",
  ) +
  theme_void()

far_from_facilities_choropleth + theme(
  panel.background = element_rect(fill = "azure2", size = 0.75),
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(face = "italic"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Code
# Map tracts more than 0.5 mile walking distance from sharps bins.
far_from_sharps_tracts <- substance_tracts %>%
  mutate(nearest_sharps_mi = drop_units(nearest_sharps_mi)) %>%
  filter(nearest_sharps_mi > 0.5)

# Align CRS.
far_from_sharps_tracts <- st_transform(far_from_sharps_tracts, 4326)
sharps_boxes <- st_transform(sharps_boxes, 4326)

# Extract coordinates from geometry to new columns for plotting.
sharps_boxes <- sharps_boxes %>%
  mutate(
    lon = st_coordinates(geometry)[, 1],
    lat = st_coordinates(geometry)[, 2]
  )

# Choropleth map colored by substance uses with sharps bin points.
sharps_choropleth <- ggplot(far_from_sharps_tracts) +
  geom_sf(
    aes(fill = `pct_substance_use`),
  ) +
  geom_sf(
    data = philly_tracts, fill = NA,
    color = "black", linewidth = 0.25,
    alpha = 0.5
  ) +
  geom_point(
    data = sharps_boxes,
    aes(x = lon, y = lat, color = "Sharps Bins"),
    alpha = 0.75, inherit.aes = FALSE
  ) +
  scale_fill_viridis_c(
    name = "Percent Substance Events",
    labels = function(x) paste0(x, "%")
  ) +
  scale_color_manual(
    name = "Sharps Bins",
    values = c("Sharps Bins" = "red2")
  ) +
  labs(
    title = "HIGH DRUG EVENTS IN PHILADELPHIA, PA",
    subtitle = "More than 0.5 mi from Sharps Bins",
    caption = "Sources: 5-Year ACS 2023, OpenDataPhilly 2023–2025",
  ) +
  theme_void()

sharps_choropleth + theme(
  panel.background = element_rect(fill = "azure2", size = 0.75),
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(face = "italic"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Code
# Align CRS.
substance_tracts <- st_transform(substance_tracts, 4326)

# Choropleth map colored by substance uses with sharps bin points and treatment facilities.
all_choropleth <- ggplot(substance_tracts) +
  geom_sf(
    aes(fill = `pct_substance_use`),
  ) +
  geom_sf(
    data = philly_tracts, fill = NA,
    color = "azure2", linewidth = 0.25,
    alpha = 0.5
  ) +
  geom_point(
    data = treatment_facilities,
    aes(x = LONGITUDE, y = LATITUDE, color = "Treatment Facilities"),
    alpha = 0.8, inherit.aes = FALSE
  ) +
  geom_point(
    data = sharps_boxes,
    aes(x = lon, y = lat, color = "Sharps Bins"),
    alpha = 0.8, inherit.aes = FALSE
  ) +
  scale_fill_viridis_c(
    name = "Percent Substance Events",
    labels = function(x) paste0(x, "%")
  ) +
  scale_color_manual(
    name = "Harm Reduction",
    values = c("Treatment Facilities" = "orange","Sharps Bins" = "red2")
  ) +
  labs(
    title = "HIGH DRUG EVENTS IN PHILADELPHIA, PA",
    subtitle = "With Sharps Bins and Treatment Facilities",
    caption = "Sources: 5-Year ACS 2023, OpenDataPhilly 2023–2025",
  ) +
  theme_void()

all_choropleth + theme(
  panel.background = element_rect(fill = "azure4", size = 0.75),
  legend.box.margin = margin(l = 10, unit = "pt"),
  legend.title = element_text(face = "bold"),
  legend.text = element_text(face = "italic"),
  plot.title = element_text(face = "bold"),
  plot.subtitle = element_text(margin = margin(b = 10, t = 5, unit = "pt")),
  plot.caption = element_text(margin = margin(t = 10, unit = "pt"), face = "italic", family = "mono")
)

Code
# Summary statistics of data used.

data_used <- substance_tracts %>%
  transmute(
  "GEOID" = GEOID,
  "Population Estimate" = total_popE,
  "Population MOE" = total_popM,
  "Substance Incidence" = substance_use,
  "Percent Substance Incidence" = pct_substance_use,
  "Nearest Meal (mi)" = nearest_meal_mi,
  "Nearest Facility (mi)" = nearest_facility_mi,
  "Nearest Sharps (mi)" = nearest_sharps_mi
)

# Align CRS.
data_used <- st_transform(data_used, 4326)
free_meal_sites <- st_transform(free_meal_sites, 4326)
sharps_boxes <- st_transform(sharps_boxes, 4326)
treatment_facilities <- st_transform(treatment_facilities, 4326)

# Only get geometry of point data.
selected_meals <- free_meal_sites %>%
  transmute(geometry)

selected_sharps <- sharps_boxes %>%
  select(geometry)

selected_facilities <- treatment_facilities %>%
  select(geometry)

# Task is repetitive, so create a function to help with counting all the meal, facility, and sharps bin point data for each tract.
count_points <- function(point_data, tract_data, id = "GEOID", name = "Count"){
  result <- point_data %>%
    # Join using within predicate.
    st_join(tract_data, join = st_within) %>%
    # Drop point geometry to avoid duplication when joining back with tracts later.
    st_drop_geometry() %>%
    # Group by census tract GEOID.
    group_by(.data[[id]]) %>%
    # Count instances of points in each GEOID.
    summarize(Count = n())
  
  # Change column name to user input name.
  names(result)[names(result) == "Count"] <- name
  
  # Return the result.
  result
}

# Use function on all point datasets.
number_meals <- count_points(free_meal_sites, data_used, "GEOID", "Number of Meals")
number_sharps <- count_points(sharps_boxes, data_used, "GEOID", "Number of Sharps")
number_facilities <- count_points(treatment_facilities, data_used, "GEOID", "Number of Facilities")

# Add counts back into tract data using left join.
data_used <- data_used %>%
  left_join(number_meals, by = "GEOID") %>%
  left_join(number_sharps, by = "GEOID") %>%
  left_join(number_facilities, by = "GEOID") %>%
  mutate(
    # Fill empty cells with 0.
    `Number of Meals` = replace_na(`Number of Meals`, 0),
    `Number of Sharps` = replace_na(`Number of Sharps`, 0),
    `Number of Facilities` = replace_na(`Number of Facilities`, 0)
  )

# Create dataframe from geodataframe for all data used.
data_used_df <- data_used %>%
  st_drop_geometry() %>%
  mutate(
    GEOID = NULL
  )
Code
# No scientific notation.
options(scipen = 999)

# Tidy summary statistics for kable.
summary_stats <- data_used_df %>%
  # Use map function to use summary function on each column,
  # generating descriptive statistics for the values and storing in a dataframe.
  map(summary) %>%
  # Convert output list to tibble, tidying and stacking rows.
  map_df(tidy) %>%
  # Use original column names as variable values and place "variable" as the first column.
  mutate(variable = names(data_used_df), .before = 1)
Code
# Kable of summary statistics.
kable(
  as.data.frame(summary_stats),
  align = "c",
  digit = 2,
  format.args = list(big.mark = ","),
  col.names = c("VARIABLE", "MIN", "Q1", "MEDIAN", "MEAN", "Q3", "MAX"),
  caption = "<b>DRUG-RELATED DATA IN PHILADELPHIA, PA: Summary Statistics</b>"
) %>%
  # Shade every other row for ease of reading.
  kable_styling(bootstrap_options = "striped") %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, color = "white", background = "black")
DRUG-RELATED DATA IN PHILADELPHIA, PA: Summary Statistics
VARIABLE MIN Q1 MEDIAN MEAN Q3 MAX
Population Estimate 50.00 2,537.00 3,207.00 3,561.02 4,410.00 8,835.00
Population MOE 44.00 446.00 690.00 752.34 965.00 1,800.00
Substance Incidence 4.00 10.00 15.00 36.02 23.00 519.00
Percent Substance Incidence 0.21 0.33 0.46 1.24 0.63 14.77
Nearest Meal (mi) 0.02 0.12 0.19 0.25 0.28 1.57
Nearest Facility (mi) 0.02 0.22 0.31 0.36 0.45 1.13
Nearest Sharps (mi) 0.00 0.42 1.49 1.49 2.19 3.72
Number of Meals 0.00 0.00 1.00 1.34 2.00 7.00
Number of Sharps 0.00 0.00 0.00 0.31 0.00 5.00
Number of Facilities 0.00 0.00 0.00 0.69 1.00 8.00

Many census tracts with higher percents of substance-related incidents lie in North Philly, Near-Northeast Philly, and West Philly primarily above Market Street. It seems like Philadelphia is taking the substance-related concerns seriously, the substance-alcohol treatment facilities and active free meal programs are peppered all around the county and have good coverage over tracts with higher substance-related events.

It also looks like most meals and facilities are within walking distance. However, sharps bins don’t have too much spatial overlap and are lacking statistically, with average distances beyond a mile. It may be worth implementing more sharps bin sites around tracts with more substance-use events.

While facilities may have disposals, many substance-users range from recreational to addicted and might fear arrest at facilities and even at sharps bins themselves, so it’s important to address the issue without any demonizing, dehumanizing, and criminalizing.

This quick look at resource distribution has policy relevance to community investment, harm reduction, siting, food programming, and quality of life for everyone, not just those directly affected by substance-use.

Feedback Incorporation

  • I removed a lot of the extra instruction text in this write-up to make the report more concise and straightforward.
  • Any printouts and summaries created outside of kable were commented out or removed to give the report more clarity and not fill it with extra text or numbers.
  • I broke up long paragraphs when making any comments or analyses and shortened figure titles for readability.

NOTE: Credit and AI Usage

No AI usage.

Base code template provided by Dr. Delmelle with adjustments.

Visuals taken from previous labs and lecture examples and modified.